A reverberation-time-aware DNN approach leveraging spatial information for microphone array dereverberation

نویسندگان

  • Bo Wu
  • Minglei Yang
  • Kehuang Li
  • Zhen Huang
  • Sabato Marco Siniscalchi
  • Tong Wang
  • Chin-Hui Lee
چکیده

A reverberation-time-aware deep-neural-network (DNN)-based multi-channel speech dereverberation framework is proposed to handle a wide range of reverberation times (RT60s). There are three key steps in designing a robust system. First, to accomplish simultaneous speech dereverberation and beamforming, we propose a framework, namely DNNSpatial, by selectively concatenating log-power spectral (LPS) input features of reverberant speech from multiple microphones in an array and map them into the expected output LPS features of anechoic reference speech based on a single deep neural network (DNN). Next, the temporal auto-correlation function of received signals at different RT60s is investigated to show that RT60-dependent temporal-spatial contexts in feature selection are needed in the DNNSpatial training stage in order to optimize the system performance in diverse reverberant environments. Finally, the RT60 is estimated to select the proper temporal and spatial contexts before feeding the log-power spectrum features to the trained DNNs for speech dereverberation. The experimental evidence gathered in this study indicates that the proposed framework outperforms the state-of-the-art signal processing dereverberation algorithm weighted prediction error (WPE) and conventional DNNSpatial systems without taking the reverberation time into account, even for extremely weak and severe reverberant conditions. The proposed technique generalizes well to unseen room size, array geometry and loudspeaker position, and is robust to reverberation time estimation error.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Joint Training of Multi-Channel-Condition Dereverberation and Acoustic Modeling of Microphone Array Speech for Robust Distant Speech Recognition

We propose a novel data utilization strategy, called multichannel-condition learning, leveraging upon complementary information captured in microphone array speech to jointly train dereverberation and acoustic deep neural network (DNN) models for robust distant speech recognition. Experimental results, with a single automatic speech recognition (ASR) system, on the REVERB2014 simulated evaluati...

متن کامل

Ensemble of Jointly Trained Deep Neural Network-Based Acoustic Models for Reverberant Speech Recognition

Distant speech recognition is a challenge, particularly due to the corruption of speech signals by reverberation caused by large distances between the speaker and microphone. In order to cope with a wide range of reverberations in real-world situations, we present novel approaches for acoustic modeling including an ensemble of deep neural networks (DNNs) and an ensemble of jointly trained DNNs....

متن کامل

A microphone array processing technique for speech enhancement in a reverberant space

In this paper, a new microphone array processing technique is proposed for blind dereverberation of speech signals affected by room acoustics. It is based on the separate processing of the minimum-phase and all-pass components of delay-steered multi-microphone signals. The minimum-phase components are processed in the cepstrum-domain, where spatial averaging followed by low-time filtering is ap...

متن کامل

A Microphone Array System for Speech Source Localization, Denoising, and Dereverberation

There is a great deal of potential for advancement in distant-talker speech acquisition research, and a wealth of current and future technology depends upon these advances. The goal of this work is to allow users the opportunity to roam unfettered in diverse environments while still providing a high quality speech signal and a robustness to background noise and reverberation effects. In this th...

متن کامل

Real-Time Dereverberation for Deep Neural Network Speech Recognition

We evaluate a real-time multi-channel dereverberation method for the application to speech recognition with deep neural networks (DNN). The dereverberation method is based on modeling the reverberated signal as a mixture of a fully coherent direct path signal and a diffuse reverberation component, and estimating the coherentto-diffuse power ratio (CDR) from the spatial coherence of the signals....

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • EURASIP J. Adv. Sig. Proc.

دوره 2017  شماره 

صفحات  -

تاریخ انتشار 2017